A Syntax-first Approach to High-quality Morphological Analysis and Lemma Disambiguation for the TüBa-D/Z Treebank
نویسندگان
چکیده
Morphological analyses and lemma information are an important auxiliary resource for any treebank, especially for morphologically rich languages since such information is a useful precondition for any task that needs to link surface forms to semantic interpretation (either through wordnets or distributional measures). In contrast to common practice in parsing, the method used in the TüBaD/Z treebank uses syntactic information for the morphological and lemma disambiguation. We argue that this approach has an advantage in the context of treebanking since many ambiguities in morphology and lemmas can be eliminated given the syntactic context.
منابع مشابه
TüBa-D/W: a large dependency treebank for German
We introduce a large, automatically annotated treebank, based on the German Wikipedia. The treebank contains part-of-speech, lemma, morphological, and dependency annotations for the German Wikipedia (615 million tokens). The treebank follows common annotation standards for the annotation of German text, such as the STTS part-of-speech tag set, TIGER morphology and TüBa-D/Z dependency structure.
متن کاملITU Treebank Annotation Tool
In this paper, we present a treebank annotation tool developed for processing Turkish sentences. The tool consists of three different annotation stages; morphological analysis, morphological disambiguation and syntax analysis. Each of these stages are integrated with existing analyzers in order to guide human annotators. Our semiautomatic treebank annotation tool is currently used both for crea...
متن کاملRecent Developments in Linguistic Annotations of the TüBa-D/Z Treebank*
The data is taken from daily issues of the German newspaper 'die Tageszeitung' (taz) currently ranging from May 3 to May 7 1999 as well as April 3
متن کاملMorpho-phenology and chromosome number of Iranian Bromus danthoniae Trin. genotypes
ABSTRACT- Bromus danthoniae Trin. is an annual grass species which grows mainly on dry grassy rocky mountain slopes and grassy steppe, and is grazed by many herbivores and recognized as a useful pasture plant. The chromosome number, morphological and anatomical traits of 82 genotypes of B. danthoniae belonging to three sub-taxa were investigated. Twenty-seven quantitative and 20 qualitative mor...
متن کاملExtensive Study on Automatic Verb Sense Disambiguation in Czech
In this paper we compare automatic methods for disambiguation of verb senses, in particular we investigate Näıve Bayes classifier, decision trees, and a rule-based method. Different types of features are proposed, including morphological, syntax-based, idiomatic, animacy, and WordNet-based features. We evaluate the methods together with individual feature types on two essentially different Czec...
متن کامل